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(54) Speech coding apparatus and method using a filter for enhancing signal quality 



(57) A speech modification or enhancement filter, 
and apparatus, system and method using the same. 
Synthesized speech signals are filtered to generate 
modified synthesized speech signals. From spectral 
information represented as a mu!ti<limensional vector, 
a filter coefficient is determined so as to ensure that 
formant characteristics of the modified synthesized 
speech signals are enhanced in comparison with those 
of the synthesized speech signal and in accordance 
with the spectral information. The spectral information 
can be any one of LSP information, PARCOR informa- 
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tion and LAR information. A degree of freedom of 
design of the speech modification filter used for the 
aural suppression of quantizing noise contained in the 
synthesized speech signals is thus heightened leading 
to the improvement of intelligibility of said synthesized 
speech signals. A good formant enhancement effect 
can be obtained without allowing any perceptible level of 
distortions to occur within a range of permissible spec- 
tral gradients. 
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Description 

BACKGROUNn fiF th f INVENTION 

« a) Field of th e Invention 

The present invention relates generally to a system and a method for transmitting or storing speech information by 
means of codes having a lower information content than that of input speech signals. This invention relates in particular 
to a sjretem and a method far extracting from the input speech signals parameter indicative of their characteristics 
traremitting or stonng the ectracted parameters, and synthesizing the original speech signals on the Ijasis of the trans- 
mitted or stored parameters. More specifically, the invention is directed to an speech modification filter for aurally sup- 
pressing quantizing noise occurring in the syrthesized speech signals. Further, the present invention relates to a 
system, a method and a filter for enhancing the quality of the signal such as a speech intelligibility. More specifically, flie 
present invention relates to a speech enhancement which is suitable for improving the speech intelligibility of the signal 
having distortions caused by analog ft-ansmission or the signal received by the haid-of-hearing aid apparatus and which 
IS suitable for improving the brightness of the speech to be broadcasted or to be output by a toud-speaker. 

b) Description of tha Rflla|ort Ar^ 

A configuration of a speech analysisAsynthesis system is illustrated by way of example in Rg. 28. The system in this 
diagram compnses an analyzing unit 100 and a synttiesizing unit 200. The analyzing unit 100 includes an analyzer 101 
and a coder 1 02. whilst the synthesizing unit 200 includes a decoder 201 and synthesizer 202. In some applications the 
units 100 and 200 are linked to each other through communication channels, one unrt typically being remote from the 
ottier. In other applications the unit 100 transmits information through storage media to the unit 200. wherein the two 
units may corebtute a single apparatus or two separate apparahjs. The analyzer 101 extracts, from input speech sig- 
nals supplied from a user, parameter group which includes spectral information indicative of characteristics of the input 
speech signals. The extracted parameter group is coded by the coder 102 and is fed through the communication chan- 
nels or ttie storage media to the synthesizing unit 200 in which the coded parameter group is decoded by the decoder 
201 . T^e synttiesizer 202 serves to synthesize speech signals on the basis of tiie ttius decoded parameter group One 
advantage of the system having such a configuration lies in the lower information content of the transmitted or stored 
signals. This is attnbutable to the fact that the transmitted or stored signals, that is. the coded parameter group contain 
a lower information content compared with ttie input speech signals. 

A variant of the synthesizing unit 200 is illusb-ated in Fig. 29. This variant further coirprises a post filter 203 serving 
to subject speech signals derived from ttie synttiesizer 202 (hereinafter referred to as syrthesized speech signals) to a 
predetermined rnodificatioriprocess. on the basisof the 
signals (hereinafter referred to as modified syrthesized speech signals). The postf ilter 203 is used in some applications 
to aurally suppress the quartizing noise cortained in ttie synttiesized speech signals, but in other applications it is used 
to improve subjective quality such as speech intelligibility. In ttie following description ttie post fBter of ttiis type will be 
referred to as a speech modification filter or a speech enhancemert filter. The syrthesizing unit 200 provided wfth such 
a filter 203 is suited for use in a voice coding/ decoding system or a voice recognition and response system 

-^kZ^ If "® ^ ^ ^ ♦'"^^ °* 3 »yP« «"f'a"cing fomiart characteristics has 

ttie advantage of being significartly effective in suppression of the quantizing noise and in improvemert of the subjec- 
tive quality. Pnor art references disclosing such a fifter include for exanple: 

Japanese Patert Laid-open Pub. No. Sho64-13200 (hereinafter refen'ed to as reference 1)- 
Japanese Patert Laid-open Pub. No. Hei5-500573 (hereinafter refen-ed to as reference 2) ' 
Japanese Patert Laid-open Pub. No. Hei2-82710 (hereinafter referred to as reference 3)- and 
"Speech Coding System Based on Adaptive Mel-Cepstral Analysis for Noisy Channel" Pri)ceeding of Spring Meet- 
ing of Acoustical Society of Japan. Vol. 1 , pp. 257-258 (1 994. 3) (hereinafter referred to as reference 4). 

o«„ ^'^^ *® references 1 and 2 are botti used as ttie speech modification filter 203 in ttie syrthesizing unit 

^Jn aT"'^,"!!!!!' ^"^^^^ ^ above-described coded parameter group from ttie analyzing 

unit 100. A filter set fortti m ttie reference 3 is used as ttie speech modification filter 203 in ttie syrthesizing unit 200 
which receives autocon-elation coefficierts as the above<Jescribed coded parameter group from ttie analyzing unit 1 00 
Finally a filter set forth in the reference 4 is used as ttie speech modification filter 203 in ttie synthesizing unit 200 which 
receives mel;scaled cepstrum or mel-cepstrum as the above-described parameter group from ttie analyzing unit 1 00 
^ ^^w? o * schematic configuration of the filter disclosed in ttie reference 1. This filter 203 receives 

decoded LPCs from ttie decoder 201 in addition to ttie synthesized speech signals fed from the synttiesizer 202 The 
LPCs referred to herein mean a parameters obtained by linear prediction coding to be executed by Uie analyzer 101 
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depicted in Rg. 28. The linear prediction coding is a method for determining, on the basis of sampled values of input 
speech signal waveforms and in accordance with the linear prediction method, a parameters or filter coefficients of f D- 
ters of. e.g.. orders eight to twelve modeling a human vocal mechanism. 

The filter 203 shown in Fig. 30 includes a filter 204 for filtering synthesized speech signals to generate semi-modi- 

s f ied synthesized speech signals, and a filter 205 for filtering the semi-modified synthesized speech signals to generate 
modified synthesized speech signals, the filters 204 and 205 both using a parameters as their filter coefficients. It is to 
be noted that the a parameter used in the filter 204 is not a parameter a| (where i = 1 . 2. . . . . p; p being a prediction order) 
fed from the decoder 201, but a1 j = a /v * obtained by modifying the a parameter with a modified coefficient v. In 
the same manner the a parameter for use in the filter 205 is a2 j = a /t| ' obtained by modifying the a parameter a| with 

10 a modified coefficient ti. The process for modifying the a parameter a\ with the modified coefficients v and ii is executed 
by LPC modification sections 206 and 207. respectively. 

Now assume that the fitters 204 and 205 implement a denominator and a numerator, respectively, of a transfer func- 
tion H(z) for transforming the synthesized speech signals into the modified synthesized speech signals. In other words, 
let the filters 204 and 205 be an LPC filter and an inverse-LPC filter, respectively. Furthermore, filtering using the a 

15 parameter Oj as the filter coefficients is assumedly given as: 

A(z)= 2: {a|z ') (1) 

IbO 

20 

where z is a z transformation operator. Since the filter coefficients used in the filters 204 and 205 are respectively 
a1 , = a j/v ' and a2 , = a / ti ' as described above, the transfer functions of the filters 204 and 205 are respectively 
represented in the form of 1/A (zA^) and A(z/ti). Therefore the transfer function for transforming the synthesized speech 
25 signals into modified synthesized speech signals can be expressed as: 

H(z) = A(2/ti)/A(z/v) (2) 

Fig. 31 schematically illustrates a configuration of the filter disclosed in the reference 2. In this filter 203, a1| gener- 
30 ated in the LPC modification section 206 is transformed by an LPC/ACC transform section 208 from an LPC domain 
into an autocorrelation domain, and is subjected to a bandwidth expansion within the autocorrelation domain by an ACC 
modification section 209, and in accordance with Levinson recursion, is transformed by an ACC/LPC transform section 
210 from the autocorelation domain into the LPC domain. The filter 205 receives a2j obtained in this manner. Although 
the LPC modification section 207 shown in Fig. 30 is removed in this diagram, the reference 2 also suggests a config- 
35 uration including the LPC modification section 207 whose output a2j is again modified by the LPC/ACC transform sec- 
tion 208. ACC nmlification section 209 and ACC/LPC transform section 210. 

Fig. 32 illustrates a schematic configuration of a filter disclosed in the reference 3. This filter 203 is so configured 
as to have ACC/LPC transform sections 21 1 and 212 in addition to the configuration of the reference 1 . The ACC/LPC 
transform section 21 1 receives autocorrelation constants as spectral information included in decoded parameter group 
40 and then transforms the received autocorrelation constants from the autocorrelation domain into the LPC domain. The 
ACC/LPC transform section 212 receives a part of order m (m < p) or less of the autocorrelation constants to be 
received by the ACC/LPC transform section 21 1 and then transforms tiie received autocon^elation constants from tiie 
autoconrelation domain into the LPC domain. The LPC modification sections 206 and 207 modify a parameters derived 
from the ACC/LPC transform sections 211 and 212, respectively, in the same manner as the reference 1. It is to be 
45 appreciated that the autocorrelation constants to be provided as input in this configuration may be ones which have 
been decoded by the decoder 201 (tiiat is, autocorrelation constants obtained through calculation by the analyzer 101 
and ttirough coding by the coder 102], or may be ones which have been calculated by the decoder 201 or synthesizer 
202 on the basis of different type of spectral parameters decoded in the decoder 201 . 

Figs. 33 to 35 represent log-power vs. frequency spectrum characteristics of the speech modification (or enhance- 
so ment) filters disclosed in the references 1 to 3. In these diagrams, A to D represent, respectively, characteristics of the 
synthesizer 202, characteristics of tiie filter 204. inverse characteristics of tiie filter 205, and tiie transfer function H (z). 
For example, in Figs. 30 and 33, A represents 1 / A (z); B represents 1/A (z/v); C represents 1/A {z/r\)\ and D repre- 
sents H (z) = A (z/r|) / A (z/v) . As is apparent from tiie expression (2) relating to reference 1 and also from Figs. 33 to 
35 relating to references 1 to 3, tiie filter 204 functions as a filter enhancing formants of spectrum of the synthesized 
55 speech signals and suppressing valleys of that spectrum, whilst the filter 205 functions as a filter eliminating a spectral 
gradient induced by the filter 204. It is envisaged that the degree of enhancement and suppression by the filter 204 will 
increase accordingly as v becomes larger, and that it will decrease as v becomes smaller. It is assumed in the reference 
1 that 11 and v satisfy 0 ^ ti ^ v < 1. Fig. 33 represents an example witii v = 0.8, r\ = 0.5; Fig. 34 an example using a 
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9?°" ^ ' ° ^ "^fl- ^ ^"^'^ with p = 10. m = 4. 

«^ 'l^- comparison between Figs. 33 and 34 or from the comparison between Figs. 33 and 35. the 

5 «"^,«"°!'"«"') ^efe^ences 2 and 3 will be able to heighten the effect of eliminating 

^JZl ^ "^T*" "^■'^ ''^ P^fl" » ^ « high-frequency 

On the confrary. the technK,ues disclosed in the references 2 and 3 will make it possible to heighten the effecTS 
venS^f Peak:valley structure of the spectrum and to render the spectral gmdtent flatter. This to a^r^ 

vention of deterioration in brightness and naturalness by the filter 203 

nv.r m!lLh«-^?1^^^* techniques disclosed in the references 2 and 3 are in one aspect an improvement 
6^^^^^'^'^'^'^^^ '"^"^^ '"♦^'^^ to ^^^^^P'e. although it mi 

cl<»ed in thereference 2 has a deficiency that the resultant modified synthesized speech signals often involve unloue 
distortions. This anses from the fact that an extremely powerful spectrum smoothing process is fSSS^Se 
Sit : 'T" ^P^*"-" '^"^y ^^torted in tSe vicing ^^s^S ng formSnfe 

r^ST inrc^rj,:^ 

SSr?^ d^?in ?^«n *«*»ed in the reference 3. due to a reduction in the filter order in the auto- 

oonr^afcor^ domain rt often suffers from 

T"^!^ ''T' ^"^^ ^" ""^^'^^ ^P^'*^' ^"««o" give rise to d^tortons 

35 iJr^mlT ""k '"''^ ^ "^'^'^^ '''^ Characteristics B and C Indicated in Rg 

35. for exarnple. rt can be seen that a phenomenon occure In which fbrmant having the lowest frequency anwno the 

^S^rfST^?^^"^^"* u"* ^ ""^y °' ^ wrtf^ the result that the 

resultant modified synthesized speech will fluctuate unnaturally 

d^i^^^Jff]"'^""^ '^^'^"^ ^ *° ^ « Problem of a low degree of freedom of 

design (freedom in operation and control of characteristics). In the case of the technique disclos^in the refwenTeT 

thT^/o*^!^!! ^J!^ ? "P"*^ 9^^'^ time do not become so marked. In 

attnlx^table to the spectrum smoothing process wHhin the autocorrelation domain will become more signlfi^ The^ 
fore the variable ranges of v and lag window frequency must be restricted, making it impossible to greattyXnge lLe 

will ^ naturaHy lowered since it employs the filter order as its control variable, which is a finite integral value 

the reteni^TS 5STn?h' T^^ura^" °* '^^ "^^'^ "^»'<=««o" (o^ enhancement) filter 203 disclosed in 
r^i!» ? ^ '^'^ '^'^'^'^ """^ greatly from the above^escribed prior art techniques in that It 

^^^^ ^'^"^'^ "^^^ synthesized speech signals through filtering, using as its fil- 

ter coefficient modified mel-scaled cepstrun obtained by modifying input mel-sc^ed cepstrum Tha? is Zhefzed 

S:Sl3"iL? '''"fi-^ f "^"^ ^ coefficients modified mel-sca^ c^m generSe^S^I 

T 214. More spectfically the mel-scaled cepstrum modification section 2U 

thtrvarne^tm.^^^^ 

fTttPr?i«t ! M T« f I"^^"^ ^ Of *i8 modified mel-scaledcepstrum as 

Si 1!* ^ ' synthesized speech signals, and provides obtained signals as its output in the form of 

M?«5Auif 213 is referred to as a mel^caled log^ectral appZiS 

(MLSA) filter since it employs the modified mel-scaled cepstrum as its filter coefficient approximation 

tra JJrnl!Z"]?;r,'^ "^f""" "f^ ^ ^""^^^^ ^"alyzer 101 through orthogonal 

2^?nr« K °' "P"""' "'9""'^ a^^^-^'ly be impossible for the tecliiques S Je 

Strum fo rtLmiiir lf ' '^r*' '° " "^'^ ^"^^ '"*'^'"««<'" into me^^S 

Sroiah rfa^LT;Jl?f "-^2'°^' °' ^P'*"' S*""^*'^. w^ich will necessitate calculatL of L?C 

JStn .hrf S M "'Ifyf^^'^"' ^^^^ signals. In addition, even the thus calculated LPC contains distortions 
re^atve to the LPC obtained through theanalysis of original speech and hence it will not ensure 8^^ 
Stons. °" """^ °' * '^^^ ^^'^ «h« oSSe^elTtllSl 

nfho^!!!!!^^!'/:^!!^ T.".^ "'Z* ""^^'^^^ '^'^"^^ * '^l' *ace a proWem of poor connectability in 

other wonds. of impossibility of applicationtosystemsdesignedtosynthesize the speech signalsSused 
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group other than cepstrum parameters. Typical of such systems are, for example, ones using parameter groups such 
as LPC, LSP (line spectrum pairs), and PARCOR (partial autocorrelation coefficients). This problem is serious since the 
LPC, LSP and PARCOR are often used for speech coding/decoding. If a speech modification filter using mel-scaled 
cepstrum as its filter coefficient is incorporated into the synthesizing unit 200 receiving LPCs as one or parameters, 

5 then the spectral geometry wilt be distorted with the transformation from the LPC domain into the mel-scaled cepstrum 
domain, as described hereinbefore. It is natural that this distortion can be eliminated to some degree by again calculat- 
ing the mel-scaled cepstrum through re-analysis of the synthesized speech signals. Even though the mel-scaled cep- 
strum has been calculated in this manner, however, it will still contain more distortions compared with the mel-scaled 
cepstrum which would be derived from the original speech. Thus, not very good speech modification characteristics are 

10 to be expected. 

SUMMARY OF THE INVENTION 

A first object of the present invention is to provide a speech modification (or enhancement, which will be omitted 

75 hereinafter) filter ensuring a good formant enhancement effect within a range of permissible spectral gradients. A sec- 
ond object of the present invention is to provide a speech modification filter ensuring a good formant enhancement 
effect without causing any perceptible level of distortion in the formant structure. A third object of the present invention 
is to provide a speech modification filter capable of implementing the same formant enhancement effect as the prior art 
by using a lower number of constituent means than the prior art. A fourth object of the present invention is to provide a 

20 speech modification filter allowing selective execution of the control of brightness, reduction in the processing proce- 
dures, improvement in Intelligibility, etc. A fifth object of the present invention Is to avoid the necessity of the stability 
proof in the donnain whose nature is different from the domain to which the input spectral information belongs, and to 
thereby provide a speech modification filter having a high degree of freedom of design. A sixth object of the present 
invention is to provide a speech modification filter suitable for a synthesizing unit which receives LSP, PARCOR, LAB 

25 (log area ratio), etc., as spectral information from the analyzing unit side. A seventh object of the present invention is to 
provide a speech modification filter ensuring, upon the input of LSP. PARCOR, LAR, etc., as spectral information, a 
good connectability without the need for any spectrum re-analysis or parameter transform. It is an eighth object of the 
present invention to implement a speech synthesizing system by use of the speech modification filter which is able to 
achieve the above first to seventh objects. 

30 According to a first aspect of the present invention, synthesized speech signals are filtered through a transfer func- 
tion defined by a filter coefficient, to generate modified synthesized speech signals. This filter coefficient is generated 
on the basis of spectral information represented in the form of a multi-dimensional vector and belonging to a predeter- 
mined domain and pertaining to input speech signals, in such a manner that formant characteristics of the modified syn- 
thesized speech signals are enhanced in accordance with the above spectral information and in comparison with those 

35 of the synthesized speech signals. Available as the spectral information is any one of LSP information, PARCOR infor- 
mation and LAR information. Because of specific features of the LSP information. PARCOR information and LAR infor- 
mation, the operations for generating the filter coefficients can be performed as operations of such a nature that 
arithmetic associated with individual dimensions is dependent on arithmetic associated with the remaining dimensions. 
When using the LSP, PARCOR or LAR information to generate filter coefficients, the filter stability can be secured with- 

40 out transforming them from the LSP, PARCOR or LAR domain to another domain. Please note that in the filter using, 
for example, the filter coefficients generated from the LPC information, it is necessary to transform the filter coefficients 
from the LPC domain to another domain to prove the stability of the filter. In consequence, according to the first aspect 
of the present invention. It Is easier to design the speech modification process or filter without Introducing instability 
thereto, than the prior arts using the filter coefficients generated from the LPC information. In addition, application of 

45 this aspect to systems transmitting or storing the LSP information, PARCOR information, or LAR information would not 
need any spectrum re-analysis and parameter transformation, whereby a good connectability can be ensured. 

The filtering in the present Invention can be performed wltiiin any one of the LPC donnain, LSP domain and PAR- 
COR domain. In other words, the filter coefficients in the present invention can belong to any one of the LPC domain, 
LSP domain and PARCOR domain. According to a second aspect of the present invention, spectral information is first 

50 modified within a domain to which it belongs to generate modified specti'al information, and the modified spectral infor- 
mation is then transformed from that domain into the LPC domain to generate filter coefficients, and the thus obtained 
filter coefficients are used for filtering within the LPC domain. Since a variety of modified coefficients can be employed 
for this modification, this aspect will make It possible to wore freely modulate the filter coefficient synthesis than the prior 
arts, in accordance with filtering characteristics (synthesized speech signal modification characteristics) demanded by 

55 the users. 

According to a third aspect of the present invention, the spectral information is so modified as to reduce the peaks 
of fbnnants of the modified synthesized speech signals. Therefore tiiis will make it possible to obtain a good formant 
enhancement effect within a range of permissible spectral gradients and to obtain a good formant enhancement effect 
without causing any perceptible level of distortions in the formant structure. 
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cn. "5°"*:^^^' ® f ""^ modification is a method in which the spectral information pertaining to the input 

t inforf^ation belonging to the same domain are proportionally divided in accordance 

Tnon thl '?^""L^'^ ^P***" information is LSP information. Depending 

H , ^^^^ reference information, this method would make it possible to perlbrm the following mod 
!S ■ ? a modification for imparting a fixed spectral gradient to the modified synthesized speech signals; 

l^c /^f ?" '"L?"^- ^ T^'"^ ^'^^^ "^^^^ ^«=«rum to the modified synthesized speech 

signals (that is. a modification lor slightly enhancing a speech spectrum other than the noise spectrum): and a modifi- 
2 i.II^lo?^'!"^ !° «;«"^/'«^ synthesized speech signals a spectrum gradient reflecting a history which the spec- 
™«ti7Sr ^^^^^ ^ ^ enf'ancing the amount of variation in the speech spectrum). 

This will mate rt possiWe to effect control of the brightness, reduction in the information processinrprocedures and 

actenstics of the other secondary filtering processes (for exanple, a fixed high-frequency enhancement process) 

^^^r" '"^ ^^"^ "'S""'"' '"*°^™«°" is '""'«P««' by a modified ooef- 

PARCOR information or LAR information. This method also ensures some of the effect listed above. e.g. the reducUon 

mation, use is made of the method multiplying the spectral infomiation by the power of the modified coefficient and that 
said power is dependent on the dimension of the spectral information 

Conceivable as a third method for modrcation is a method in which distances are expanded between adiacent 
TnTT ^"^V ^r^t^ dimensions representative of the spectral information pertaining to the input s;>eech 

sions so as to ensure that the extent of the spectral information in its entirety becomes coincident^th the extent t^efo"e 

^^S,^v"T"J"'^ °* synthesized speech signals is flattened and ensures 

traradiiS.tSl1SrL'-^*'!:!2^^^ 

tralgradient In addition, the reduction of the process or the components relative to the first and second methods is real- 

the f'i!^ml^''«.^'^^S l"^ modification methods are combined with each other. In that case. 

2« IS.^^ ^ K 1^^:?*":°^ "^^ °' alternatively both may be used cooperatively As to 

^ ^^T. ^ *° *^ ""^^^ three methods it will be appar- 

ent from the later description on embodiments for the person skilled in the art 

tion2l!iT.!l'^ilZIf"-^^^^^^ 

fZ^n^ speech signals in correlation with modified spectral information and generates the modified spectral 
2Z r„ HTf?"!!*° fl"""^^ of the spectral information; and secondly, a neural network which has acquired, by 
cT.i 7 to trai^form spectral information into modified spectral information so as to be able to g^^erate the 

thft^SS^N Ht°" '"""'^ °* *® "P"*^' '"P"* 'f'^ ^Snate. It is preferable that 

r/rSf H . ! P'""'^^ for each of a plurality of categories which do not overlap wift 

fh » .H K ^'^ ^"^'Hf '^"^^"^ '^^^ '° "*^'"ation about input speech 7gnals 

^^^LT^l . °' ~eff icients for each category This 

eaS Stiory *° modification method other than the first to third methods fbr 

Hom^nT^"oAorr^o'i* °^ *"t«""8 « executed within any one of the LSP 

S S b?on« «^ l?""''".' "1' ^P^'^ 's "'^"''^ a domain to 

foM^ is used as a fitter coefficient. This aspect will eliminate 

T!^!!* « ^ I!" "^'"^ associated wtth the modified spectral information, making it possible to provide 
substejitiany the same formant enhancement effect as the prior art by less number of constituent elements tha^Se 

cno^fl*^'"'""? *° ^ *f *® ""^'■'"9 that formants of the modified synthesized 

a^^T ? * •''^^ synthesized speech signals. According to sixth 

According to a seventh aspect of the present invention, syrthesized speech signals are generated on the basis of 
S'att iSrr: ^''r ' '""'«<l""ensional vector and belonging to a ^edeternJned do^^ anJ 
7. fh» ^''Ti; ^"^'^^^ P^~«^« with the above^escribed aspects are executed on 

are generated on the base of first spectral information represented as a multi-dimensional vector and belonging to a 
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predetermined domain and pertaining to input speech signals, and the first spectral information Is transformed into sec- 
ond spectral information belonging to a domain different from the domain to which the first spectral information has 
belonged so far, and then the processes involved with the above-described aspects are executed on the basis of the 
second spectral information. According to a ninth aspect of the present invention, synthesized speech signals are gen- 

5 erated on the basis of first spectral information pertaining to input speech signals and belonging to a predetermined 
domain and represented as a multi-dimensional vector, and the synthesized speech signals are analyzed to generate 
second spectral information, and then the processes involved with the above-described aspects are executed on the 
basis of the second spectral information. According to a tenth aspect of the present invention, previous to the processes 
Involved with the seventh to ninth aspects, spectral information or first spectral information is generated through the 

10 analysis of input speech signals, and the spectral information or the first spectral information is stored or transmitted. 

BRIEF DE3CRIPTI0N QF THE DRAWINGS 

Fig. 1 and Rg. 2 are block diagrams each showing a configuration of a speech nrKxiification fitter in accordance with 
15 an LSP-based embodiment among preferred embodiments of the present invention; 

Fig. 3 is a block diagram showing, by way of example, a configuration of a speech analysis^ynthesis system; 

Fig. 4 is a block diagram showing an example of an LSP modification method; 

Fig. 5 is an explanatory diagram of a method of generating modified LSP through a proportional division; 

Fig. 6 and Fig. 7 are block diagrams each showing an example of the LSP modification method; 
20 Figs. 8 is a graphical representation of log-power vs. frequency spectrum characteristics of the LSP-based enr^od- 

iment among the preferred embodiments of the present invention, which characteristics are obtained in the case of 

using a method of generating the modified LSP through the proportional division in the Rg. 1 configuration; 

Fig. 9 is a block diagram showing an example of the LSP modification method; 

Figs. 10 is a graphical representation of log-power vs. frequency spectrum characteristics of the LSP-based 
25 embodiment among the preferred embodiments of the present invention, which characteristics are obtained in the 
case of using a method of generating the modified LSP through the expansion of distances between adjacent 

dimensions in the Fig. 2 configuration; 

Fig. 1 1 , Fig. 12, Fig. 13, Fig. 14, Fig. 15 and Fig. 16 are block diagrams each showing an example of the LSP mod- 
ification method; 

30 Fig. 1 7 and Fig. 18 are block diagrams each showing a configuration of a speech modification filter in accordance 
with an embodiment executing filtering within LSP domain, among the preferred embodiments of the present inven- 
tion; 

Fig. 19 is a block diagram showing a configuration of a speech modification filter in accordance with a PARCOR- 
based embodiment among the preferred embodiments of the present invention; 
35 Fig. 20 is a graphical representation of log-power vs. frequency spectrum characteristics of the PARCOR-based 
embodiment among tiie prefen'ed embodiments of the present invention; 

Fig. 21 and Fig. 22 are block diagrams each showing a configuration of a speech modification filter in accordance 
with an embodiment executing filtering within PARCOR domain among the preferred embodiments of the present ' 

invention; 

40 Fig. 23 is a block diagram showing a configuration of a speech modification filter in accordance with an LAR-based 
enribodiment anfx)ng the preferred embodiment of the present invention; 

Rg. 24 is a graphical representation of log-power vs. frequency spectrum characteristics of the LAR-based embod- 
iment among the preferred embodiments of tiie present invention; 

Fig. 25 and Fig. 26 are block diagrams each showing a configuration of a speech modification filter in accordance 
45 with an embodiment executing filtering within an LAR domain or a PARCOR domain among the preferred embodi- 
ments of the present invention; 

Rg. 27 is a block diagram showing a configuration of a speech modification filter in accordance with an embodiment 
utilizing a plurality of parameters among tiie preferred embodiments of the present invention; 
Fig. 28 is a block diagram illustrating, by way of example, a configuration of a speech analysis/syntiiesis system; 
50 Fig. 29 is a block diagram illustrating a manner of using a speech modification filter; 

Fig. 30. Fig. 31 and Fig. 32 are block diagrams illustrating configurations of tiie speech modification filters disclosed 
in reference 1 . reference 2 and reference 3, respectively; 

Fig. 33, Fig. 34 and Rg. 35 are graphical representations of log-power vs. frequency spectrum characteristics of 
the speech modification filters disclosed in the reference 1 , reference 2 and reference 3, respectively; and 
55 Fig. 36 is a block diagram illustrating a configuration of the speech modification filter disclosed in reference 4. 
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PETAII.EP DESCRIPTION OF THE PREFERRgn F MBQDIMPMTR 

'^^^ ^ e"ce to the accompanying drawings, in which 

wnstrtuert elements identical or corresponding to the prior art techniques shown in Figs. 28 to 36 are designated by 
fte wme reference numerals and will not be further explained. It is to be noted that constituent elements common to 
respective embodiments are also designated by the same reference numerals and will not be repeatedly explained. 

a) LSP-based Embodiment 

^ ^ ^"^ ^ ^^'^ "^^'"^^ embodiments receiving LSP as spectral information in 
decoded parameter group, among preferred embodiments of a filter 203 in accoidance with the present invention The 
tn^^T.^T' r modification sections 216 and 21 7 and LSP/LPC transform sections 21 8 

sSioi 2 6^S^;r«f<*?.' P^L'^V^ ^ "^"""^ '^'fl- 2 «"^'^ the LSP modification 

section 216 and the LSP/LPC transform section 218 in addition to the filter 204. 

These embodiments can be used in the synthesizing unit 200 having a configuration as shown in Fig. 30 or 3 In 

ri.^« thon r*!?' "^T^' '° ^" °' P^^-^ete^ 9^<^R *e filter 203 can directly 

receive the output from the decoder 201 as shown in Fig. 29. whereas in the case of using the decoder 201 whicTis 

3^2^!^ffofJl1l!'fr ^"^ ^"PP"^ '"^^ ""^^ 203. as shown in Rg. 

3. It IS to be appreaated that the transform section 215 may be integrated into the decoder 201 or the synthesizer 202 

H» JL!inT T f °" ^ ""^^ a multi<limensional vector from th'e 

, ^ i . '■ '^'^"^^y- The LSP/LPC transform sections 218 and 219transform arfil. and lh2, respectively 

S nflfr*" *f I.'-''' 'r^'" "^''"^ « ^"^''"^ « ^ ' restUctively ^heK 20^ 

filSSS,T' T ^"^^^ signals using a1, and a2, respectively, as their respective 

i!^c^f T ^.u P'"^*^^^ '"^^'"^ synthesized speech signals as its output Now. let the 

hansferfunctionsof thefilters204and205be 1^ 
Or rig. 1 Can oe given as 

H(z) = A2(z)/A, (z) (3) 
and the transfer function of the filter 203 of Fig. 2 can be given as 

H(z)=1/Ai(z) (4j 

1"^^ ^^^^^ embodiment of the present invention, in this manner. LSP ©j received as one of parameters is 

rnod^ed and the rnodified LSP (and LSP «>h2D are transformed from the LSP domain into the lS doS^^to 

TeTf^ISri^ueSyoSSr"'''^^^^ 

0 < 0>^ < (|>2 < ... < CDp <n (5) 

The^fore. so long ^the LSP satisfying equation (5) is used as the filter coefficient, the process for generating a., and 
aa can be performed independently for respective i. witiiout introdudng the instability to the filter. As a result a high 

erate fitter coeffK:ient. only the process with proof that it would not introduce the instability to the fitter can be us«l to 

tnif ? h"^' '"'^'^^ 1 »o 3- s'"<=e in the a parameter domain or in the autocorrelation domain, it is dif- 

^^ l^^Z ^ "^'^ °* """^ "^'"9 ''^^"^ °" parameters. Accordingly 

*e n^ification process performed for respective i or with adjustment of the degree of enhancement along the It 

Z^.^ , introduction of ttie instability to the filter when the a (»rameter 

based or the autocorrelation based fitter coefficients are used f«.a..wiei 

inn m/f Qp"^ adh/antage of the LSP-based embodiment lies in a higher applicability to the systems transmitting or stor- 
the spectral information. IWIost of the speech coding/decoding systems in particular which have been 

frT^^Jll^i ^T^l ^ T '"for^t'o" -n^e LSP-based embodiment of the present 

inventwn is easily applicabto to such types of speech coding/decoding system, -mat is. due to «ie fact that Uiere is no 
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need for re-analysis of the spectrum and transformation of parameters, a good connectability can be obtained to such 
type of systems, unlike the prior art where the filter coefficients are determined on the basis of input mel-scaled cep- 
strum as disclosed in the reference 4. 

As Is apparent from the above desaiption, the transfer function H (z) of the filter 203 in the LSP-based embodiment 

5 of the present invention will depend on the manner of performing the LSP modifying operation and LSP/LPC transform- 
ing operation to obtain the filter coefficients a1 j and a2j. A preferred metiiod for the LSP modifying operation is firstly a 
proportional division modification and secondly an adjacent dimension-to-dimension distance expansion. 

The proportional division modification mentioned first is a method in which coj is proportionally divided using modi- 
fied coefficients v. ii satisfying 0 ^ v ^ t| < 1 as proportional division ratios. When this method Is executed in the con- 

10 figuration of Fig. 1, the LSP modification sections 216 and 217 each have a functional configuration including a 
proportional division operating section 220 and a gradient setting section 221 as shown in Rg. 4 for example. The pro- 
portional division operating section 220 generates cohl j or dh2i in accordance with the following expression for propor- 
tional division: 

15 fi)h1 j = ©j X (1 - v) + <of| X V or coh2j = CO, X (1 - Ti) + of , X Ti (6) 

where I = 1, 2. ... p. 

The gradient setting section 221 sets cofj in the proportional division operating section 220 on tiie basis of the linear pre- 
diction order p. It is to be appreciated that cafj used in the LSP modification section 216 may be different in value from 
20 cofj of section 217. Also tiie modification of o>fj through the proportional division may be applied to tiie configuration of 
Fig. 2. 

A first advantage of the proportional division is to ensure an improved formant enhancement effect. That is, when 

©hi j and (i)h2j generated through the proportional division are transformed from the LSP domain into the LPC domain. 

formants become dull witii the result that a good formant enhancement effect can be obtained. "Formants become dull" 
25 herein means that "peaks of formants become small", in other words, "spectral characteristics flatten while leaving the 

spectrum having a somewhat peak-valley structure". 

A second advantage of the proportional division is to ensure a high degree of freedom of designing characteristic 

In conformity with demands of the users, such as varying the degree of modifying the syntiiesized speech signals for 

each frequency band. In particular, by designing ©fj besides v and t|, the characteristics of tiie filter 203 can be varied 
30 so as to well meet tiie demands of the users. This high degree of freedom of design will lead to an effect tiiat within a 

range of permissible spectral gradients a better formant enhancement effect surpassing tiie conventional techniques 

can be easily obtained. 

It is envisaged that there are several methods of setting cofj. A first metiiod is to set LSP representative of a flat 
spectrum as cofj. The gradient setting section 221 Implemented In conformity with this method sets ©fj in such a manner 
35 that ©fj adjacent dimension-to-dimension distance ( = ©fj - ©f j . i) results in a certain value represented as n / (p + 1 ) , 
in accordance with the following expression 

©fi = nxi/{p + i) (7) 

40 Fig. 5 conceptually illustrates ©hlj generation as an example, tiie modifying-by-proportional-division operation which 

will take place when setting ©fj in accordance with the expression (7). Note that an assumption of p = 10 is made herein. 

This method has the advantage of its functional simplicity in the gradient setting section 221 . 

A second metiiod is to set LSP representative of a fixed gradient spectrum as ©f j. The gradient setting section 221 

Implemented In conformity with this method sets ©fj in such a manner that the ©fj adjacent dimension-to-dimension dis- 
45 tance linearly increases or decreases in accordance witii tiie following expression obtained by adding tiie term 6 (i) 

depending i to the right side of tiie expression (7) 

©f, = nxi/(p + l) + 6(i) (7a) 

50 In this case It could easily be seen by those skilled In the art from the above description and the disclosure of Fig. 5 how 
the proportion division modification action takes place. This method f irstiy has tiie advantage of allowing tiie brightness 
to be controlled through the setting of proportional coefficient of ©; since a substantially fixed gradient can be imparted 
to the characteristics of tiie filter 203. It secondly has the advantage of allowing the processing procedures to be 
reduced since the transfer function H (z) of this filter 203 can contain tiie characteristics of a fixed high-frequency 

55 enhancement process which may be carried out almost simultaneously with the ordinary formant enhancement proc- 
ess. It thirdly has the advantage of being capable of applying it to suppress the brightness variation by changing 5 (i) to 
5 (©i) and modifying its functional block by dotted line in Fig. 5. 

A third method is to set as ©f j an LSP obtained by modifying the LSP representative of an average noise spectrum 
through, for example, tiie proportion division process. The gradient setting section 221 implemented in conformity witii 
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this method sets orfj, as shown in Fig. 6, by modifying LSP (o{ representative of the average noise spectrum on the basis 
Of the proportional division ratio v' or ii'. In accordance with the following expression 

of I = CD,' X (1 - v-) + <o,' X v'or of , =. <Bj' X (1 - 1,-) + a),' x ti' (7b) 

where i = 1, 2, ... p. 

The advantage of this method Ues in improved Intelllglbliity due to the ability to somewhat enhance the speech spectrum 
inst^d of the noise spectrum. Incidentally o,' can be obtained by averaging, through an average operation section 223 
fflj within a period which has been judged to be a noise period by a judgment section 222 shown in Rg. 6 It is also pref- 
erable that the modification process which to^ undergoes be set so as not to impart too extreme a spectral variation to 
the modified synthesized speech signals. For example, if «,fi is made too dull, it will become possible to prevent any 
extreme spectral variation from occurring in the modified synthesized speech signals 

A fourth method is to set as of, an LSP obtained by modifying, for example through the proportional division proc- 
ess^n average value otj^ dunng a period up to now after the start of action or during a past predetermined period. As 
u^u ■ ^ ^'^ '"Pigmented by this method finds an average value a.' of the past LSP 

Oi through the average operation section 223 and sets mi, on the basis of this m; and the proportional division latto v' or 
I fh". K-f ? u*^ expression (7b). The advantage of this method lies in improved intelligfcility attributable 

to the abil^ to enhance variations in the speech spectrum. It is also preferable for the execution of this method that con- 

■ c- "1®'"u"^u*'®!! ^^'^-^ ^^'^ log-power vs. frequency spectrum characteristics of the filter 203 shown 

in Fig. 1 , which will appear when cDj is modified in accordance with the expressions (6) and (7). In the graph A B C and 
□ respectively represent the synthesizer 202 characteristics = 1 / A (z). thefilter204 characteristics = 1 / A;(zj. thefilter 

0. 8. As shown in this graph, the characteristic D of this graph is flattened while leaving the spectrum peak-valley struc- 
ture to a certain ©dent, in comparison with the characteristic D of Fig. 33. in Fig. 8 in this manner, a better formant 
enhancement effect can be seen compared with Fig. 33. Also the characteristc D of this graph presente less distor- 
!lriJI^?n ^f^K^^* *° ^""^ P^^'^-^^'l^y structure, than the characteristics D of Fig. 34. Furthermore, the char- 
acteristic D of this graph no longer presente the two phenomena which have been observed in the characteristics Band 

01. . ■ ^'^f fc"™"* a* 'owest frequency and integration of two torments in the middle. As an 
S^ifl K Process, the other process having an effect of dulling the fbrmanta in the LSP 
domain may be employed to obtain similar advantages. 

The present inventor has aurally compared the modified synthesized speech derived from the fHer 203 of this 
!I^h°f 3 cnSS;? i. T^T^l ^ represented by the expressfons (6) and (7), with the modified 
r u""" ^ ^ ^ "^^^ As a result, it has turned out that the 

speech modification filter of this embodiment presents an advantage over the prior art filter in terms of suppression of 
bnghtness degradation and that the former does not cause any unique distorted speech or any fluctuating tone 

The adjacent dimension-to-dimension distance expansion which is a second preferred embodiment of the LSP 
modrfying operaton can be executed by an expansion section 224 and a uniform compression section 225 as shown in 
11?.!^ 224 generates s, by shifting o,' where both of Si and o, belong to LSP domain, so that the 

acjacent d.mension-to-dimension distance s, - s, . , can be made larger than the adjacent dimension-to-dimension dis- 
lai^ce fflj - fflj . 1 (with respect to Oj -Oi . see Fig. 5). The uniform compression section 225 finds ohl , from Si It is to be 
noted 'n particular that s, as well as is a multiKlimensional vector. When this method is executed in the configuration 
of Fig. 2, the uniform compression section 225 finds cohl j in accordance with the following expression 

oh1,-s,/Sp+i xn (8) 
and the expansion section 224 finds Sj in accordance wHh the following expression 

8| -8|.i +max(o,- 0|.i. #7) (9) 

where io 1.2 p + i 

oo = 0.o>p+i =n So=0 

th: threshold value 

««JI!- * above<lescribed expressions (8) and (9). the adjacent dimension-lo<limension distance 

expansion is a process for securing at least a distance th between the fi-1)fh dimension and the i-th dimension from the 
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result of comparison of a); - ©i . ^ with th, as defined in particular by the second term on the right side of the expression 
(9). This process allows LSP associated with (i + 1)th or upper dimensions to shift together upwardly by a distance cor- 
responding to - ((Dj - <Dj . i). Also the factor n / Sp ^ ^ contained in the right side of the expression (8) Is a factor for 
uniformly compressing the adjacent dimension-to-dimension distances in response to ratios in the Oj range 0 to n and 

5 in the Sj range 0 to Sp ^ i of the LSP. It will be understood that the present invention should not be construed to be limited 
by this defining expression, and that other defining expressions may be employed as long as they represent processes 
for expanding smaller adjacent dimension-to-dimension distances. Also <0j by the adjacent dimension-to-dimension dis- 
tance expansion may be applied to the configuration of Fig. 1. This would make it possible to further increase the 
degree of freedom of design of characteristics of the filter 203. 

10 Referring next to Fig. 10 there are depicted log-power vs. frequency spectrum characteristics which will appear 
when this method is applied to the filter 203 of Rg. 2. In the graph. A. B and C respectively represent the synthesizer 
202 characteristics = 1 / A (z). the filter 204 {th = 0.3) characteristics = 1 / A1 (z ; f/7 = 0.3) and the filter 204 {th = 0.4) 
characteristics = 1 / A1 (z; th = 0.4). As is apparent from this graph, this method allows characteristics comparable to 
Figs. 33 and 34 to be presented by the filter 204 only (in other words, without using the filter 205 or any constituent ele- 

15 ment corresponding thereto). This means that a good speech modification filter can be implemented with a lower order 
fitter than that of the known filters and that substantially the same fbrmant enhancement effect as the conventional filters 
can be realized by a lower number of constituent elements. Furthermore the present inventor has aurally compared the 
modified synthesized speech obtained in this embodiment with that obtained in the traditional techniques. As a result, 
it has turned out that use of the speech modification filter of this embodiment will ensure a tone quality by no means 

20 inferior to that of the existing filters. 

The two kinds of modification methods, that is, the proportional division modification and the adjacent dimension- 
to-dimension expansion are not mutually exclusive and hence they may be used in cooperation. It is also conceivable 
for example that one of the LSP modification sections 216 and 217 executes the proportional division, the other being 
in control of the adjacent dimension-to-dimension expansion. Alternatively as shown in Fig. 1 1 , a configuration may be 

25 employed which includes switching means 228 and 229 for selectively using the proportional division modification sec- 
tion 226 serving to modify (D| through the proportional division and the adjacent dimension-to-dimension distance 
expansion section 227 serving to expand the adjacent dimension-to-dimension distances of LSP. The proportional divi- 
sion modification section 226 may have any one of the above-described configurations shown in Figs. 4. 6 and 7. Alter- 
natively as shown in Fig. 12, a configuration could be employed in which the proportional division modification section 

30 226 is connected in cascade with the adjacent dimension-to-dimension distance expansion section 227. By virtue of 
such configurations having a single LSP modification section serving both as the proportional division modification sec- 
tion 226 and the adjacent dimension-to-dimension distance expansion section 227, the degree of characteristic design 
of freedom of the filter 203 can be further increased. It may also be envisaged that the sequence of the proportional 
division modification section 226 and the adjacent dimension-tOKiimension distance expansion section 227 shown in 

35 Fig. 12 is reversed. It is natural that other processes could be combined with both or either one of the proportional divi- 
sion modification and the adjacent dimension-to-dimension distance expansion. 

Furthemiore an m\ adaptive process may be executed by the LSP modification sections 216 and 217. Conceivable 
as a method for rendering the proportional division based coj modification process co; adaptive is for example a method 
in which an coj space is divided into a plurality of subspaces (hereinafter referred to as categories) not overlapping one 

40 another and in which v and t| are prepared (or switched) for each category. In this case, the LSP modification section 
may be provided for each category, for example, an LSP modification section 216-1 (or 217-1) corresponding to a first 
category, an LSP modification section 21 6-2 (or 21 7-2) corresponding to a second category. ... and an LSP nrxxJification 
section 216-N (or 217-N) corresponding to an N-th category (see Fig. 13). Alternatively a single LSP modification sec- 
tion 216 (or 21 7) may be prepared together with a modified coefficient switching section 230 serving to switch v and t| 

45 in response to the categories or i (see Fig. 14). The a>j adaptive process has the advantage of realizing a flexible proc- 
ess which, for example, allows fbrmant enhancement to be weakened only for a specified category such as a category 
causing distortions when the formant enhancement is raised. This would ensure a uniform or distortion-less improve- 
ment in the characteristics of the filter 203. It will be appreciated that since Oj is a multi-dimensional vector the category 
refen-ed to herein is in generally a multi-dimensional vector space. 

50 It is preferable that the (o^ modifying process in the LSP modification sections 216 and 217 be implemented by use 
of a translation table 231 as shown in Fig. 15. More specifically the translation table 231 for correlating o>j with oohlj, or 
a>h2j is prepared, allowing the LSP nrxxiification section 216 or 217 to provide cohl j or <Dh2j as its output when coj is con- 
ferred. The advantage of utilizing the translation table 231 lies in a reduction of processing time. This advantage will 
become more or less remarkable if a relatively complex expression is used as a principle expression for the (Oj modlf i- 

55 cation process. 

The coj modifying process in tiie LSP modification sections 216 and 217 may be implemented by a neural network 
232 which has previously learned cDj modification characteristics conferred by for example the expression (6) as shown 
in Fig. 1 6. A first advantage of utilizing the neural network 232 lies in a reduction of processing time. This ad\^ntage will 
become more remarkable if a relatively complex expression is used as a principle expression for the coj modification 
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S!!^^;^ f °* the neural netvrork 232 lies in that a memory capacity can be reduced due to 

me fact that there is no need to store the translation table 231 compared with the case of utilizing the translation table 

.^^"I^T*^^- °L"«"^'"9 232 lies in the reduction of distortion. For example, in <o, adaptive 

m^JITf'^K '^.^^ ^1 ^^^^ ^* « '^""^3^ °' categories in the modified or semi- 

^J,T ^^"^^^'^^JPl^^ S'gnal, due to abrupt change of v and n arising from a slight variation of «^ beyond the 

rough. In translation table embodiment shovim in Fig. 15. distortions often appear at a boundary of table address in the 
same way as F«s. 13 and 14 embodiments. On the contrary, in the neural network embodimits shown in fTi6 no 
distortion occurs, since there is no category which causes the abrupt change in V and n " 
fbrnJ^LpSoHno^. embodiment of the present invention is not intended to be limited to the configuration which per- 
f^? ' ^ ^ ^"^ '"^^'^^-.'-PC f iltenng. and would allow parameters other than LPC to be used as Its filter coeffl- 
p^MnH fUlT" 17 the present invention could be implemented by use of an LSP fMer 

233 (and an 'nverse-LSP filter 234) utilizing as the filter coeflident cohl, (and «,h2^ as rt is. The aLntage^ tWs oon- 
figurationliesinthatttiereisnoneedfbrtheLSP/LPCtransfbmisections218and219. «geonniscon 

b) PARCOR43ased Embodiment 

Referring now to Fig 19. an embodiment entering PARCOR as spectral information is depicted. This embodiment 
^?pr T^^l "^"^^ ^ ^« ^ PARCOR/LPC transform sectionT237 and 

2.iLni f ^ T''^'^'' 205. The PARCOR modification section 235 enters PARCOR ^ asSe 
m?2r?;TT. ' 215 and modifies ttiis ^ to generate mod»ie^ PAR- 

Snyi^Sr ! f "^""^^ '^^ ^^^^^ modification section 236 generates modified PARCOR ♦h2, The R<VR- 
CORA^PC ti-ansform section 237 transforms ♦hi, from a PARCOR domain Into an LPC domain to generate a flter 
SoSl^ir^irT^ ''I '''^'^COR/LPC transform section 238 also transforms Ji2XmTpASc?R 

domain into the LPC domain to generate a filter coefficient a2j for the inveree-LPC filter 205 

and n!f""" ^ ^ ^ ^^^'""^ ^' respectively, using modified coefficients v 

and n satisfying, for example. 0 s ^ ^ v < 1 . and in accordance with the following expressions 

♦h1, = 4»,xv<"«%h2i = 4,,x,l 

where 1-1,2 p. 

Execution of such modification enables formarts to dull on ttie PARCOR domain 

hacil!^!!21T?; ensure the same characteristic improvement effect as that of the above LPC- 

based embodiment (e.g.. formant enhancement effect, and improvement In ability to adjust the degree of said enhance- 

nT^L'S:;! TTTT '^^"^^^ °* ««er 203 in colmrty with the SlnarSs ofus^^^^^^^^ 
™ rSiS Presentinventon should not be construed as being limited by the expression (1 0) and that other proc- 
1 5! ^^I^ """^ ^"^"^ PARCOR domain. Further, witti respect to ttie filter 

using as its filter coefficient the PARCOR or ttie parameter generated on ttie basis of «ie PAf^COR. it is^e^ativSy eaj 
oprove and secure Its stability on ttiePARCORdomain. Since the slabllityco«^^^^^^ 



-1<*i<1 (11) 
Th^rTw! '1"^ .eqi-ation (1 1) is satisfied, the filter using PARCOR based filter coefficient is stable. 

TjSt^?^J^T "^^'"^ "^^^^ * independentiy for respective i. In addition. 

fe J^,! * transmrtting or storing PARCOR as spectral information would ensure a good connectability 

Si ♦ jTi ^"^ "° ^^'""^ ^"^ P^"^"'^^ '^"sfof Fig- 20 graphically repre- 

sents the log-power vs. frequency spectrum characteristics of ttie fitter 203 in Rg. 19. In ttie graj* \ B C and D 
respectively denote the synttiesizer 202 characteristics = 1 / A (z). filter 204 characteristics = 1 M1(z) " fnterlos 
irirnm hr'''"''"^ = ' L"^ *aracteris,ics - A2 (z) / A1 (z). with v = 0.98 and n = 0.9 As a^r 

„ TT ^"^r^- 20 ^ 33. this embodiment allows the spectrum peak-valley structtJTto 

modrfied synthesized speech, ttie present inventor has ascertained ttiat use of the filter 203 of ttiis embodiment will def 
mitely not cause any unique distorted speech or any fluctuating tone, and will ensure a good formant enSSnSJt 

h,JIi'^!I^^°^'T '° H'^^ ^"^ 'II*'* the disclosure of this specification that the detaiisof this PARCOR- 

based embodiment can be constihJted from ttie same viewpoint as the LSP-based embodiment. It will also be easily 
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conceivable for those skilled in the art from the disclosure of this specification to exclude Inverse-LPC filtering and con- 
stituent elements associated therewith as shown in Fig. 21 and to employ a configuration including a PARCX)R filter 239 
and an inverse-PARCOR filter 240 with nrKxiified PARCOR <|)h1i and <t>h2j used as its filter coefficients as shown In Fig. 
22. 

5 

c) LAR-based Embodiment 

An embodiment entering LAR as spectral information is depicted in Fig. 23. This embodiment comprises, besides 
the LPC filter 204 and the inverse-LPC filter 205. LAR modification sections 241 and 242 and LAR/LPC transform sec- 

10 tions 243 and 244. The LAR modification section 241 enters LAR as spectral information from the decoder 201 or 
the transform section 215 and modifies this to generate nKXlrfied LAR yhl j. In the same manner, the LAR modifica- 
tion section 242 also generates modified LAR \|/h2j The LAR/LPC transform section 243 transforms vj/h1 j from the LAR 
domain into the LPC domain to generate a filter coefficient a1 j for the LPC filter 204. The LAR/LPC transform section 
244 transforms M'h2j from the LAR domain into the LPC domain to generate a filter coefficient a2j for the inverse-LPC 

15 filter 205. 

The LAR modification sections 241 and 242 generate yh1 j and Hrh2j respectively, using modified coefficients v and 
n satisfying for example 0 ^ ii ^ v < 1 . and in accordance wKh the following expressions 

yhi I B y I X V ' vh2j s X T] ' (12) 

20 

where i = 1,2 p 

Execution of such modification enables fbrmants to dull on the PARCOR domain. 

Consequently this embodiment will ensure the same characteristic improvement effect as that of the above LPC- 
based embodiment and the PARCOR4Dased embodiment (e.g., formant enhancement effect, and improvement in abil- 

25 ity to adjust the degree of said enhancement) as well as free control/setting of the characteristics of the filter 203 in con- 
formity with the demands of users. It is natural that the present invention should not be construed as being limited by 
the expression (12) and that other processes may be employed which make the formants dull within the LAR domain. 
Since it is proved and secured the filter stable when the filter coefficients generated on the basis of LAR are used, the 
LAR modification process in this embodiment is not restricted on the aspect of the fitter stability. Therefore, the degree 

30 of freedom of filter design in this embodiment is higher than those in prior arts. In addition, application to the systems 
transmitting or storing PARCOR as spectral information would ensure a good connectability due to the fact that there is 
no necessity for spectrum re-analysis and parameter transform. 

Fig. 24 graphically represents the log-power vs. frequency spectrum characteristics of the filter 203 in Fig. 23. In 
the graph, A. B. C and D denote respectively the synthesizer 202 characteristics = 1 / A (z), filter 204 characteristics = 

35 1 / A1 (z), filter 205 inverse-characteristics = 1 / A2 (z), and filter 203 characteristics = A2 (z) / A1 (z), with v = 0.9 and 
Ti = 0.7. The comparison between Figs. 24 and 33 has revealed that this embodiment allows the spectrum to be flat- 
tened while leaving spectrum peak-valley structure to some extent, resulting in a better formant enhancement effect 
compared with the configuration disclosed in the reference 1. /Mso, in comparison with Fig. 34. Fig. 24 presents less 
distortions involved with the peak-valley structure of the spectrum. In Fig. 24 a phenomenon of integration of two form- ' 

40 ants in the middle no longer appears, which will become apparent from the comparison between the characteristics 6 
and C of Fig. 35. Through aural comparisons of the modified synthesized speech, the present inventor has ascertained 
that use of the filter 203 of this embodiment will definitely not cause any unique distorted speech or any fluctuating tone, 
and will ensure a good formant enhancement effect. 

It will be obvious to those skilled in the art from the disclosure of this specification that the details of this LAR-based 

45 embodiment can be constituted from the same viewpoint as the LSP-based embodiment and the PARCOR-based 
embodiment. It will also be easily conceivable from the disclosure of this specification for those skilled in the art to 
exclude inverse-LPC filtering and constituent elements associated therewith as shown in Fig. 26 and to employ a con- 
figuration including a PARCOR-filter 239 and inverse-PARCOR filter 240 with modified LAR ^/hlj and \^h2\ used as its 
filter coefficients. Further, to transform the modified LAR \|/h1| and \|/h2| from LAR domain to PARCOR domain, 

50 LAR/PARCOR transforming sections 246 and 247 are provided in Fig. 26. Since in general the LAR/PARCOR trans- 
forming process is relatively simple and easy to perform than the LAR/LPC transforming, the LAR/PARCOR transform- 
ing sections 246 and 247 can be implemented with less processing steps or with smaller circuits than tiie LAR/LPC 
transforming sections 243 and 244. Therefore, according to Fig. 27 embodiment, the filter coefficients alj and a2\ are 
derived within shorter period than, and whole process by the filter 203 is reduced from, Figs. 23 and 25 embodiments. 

55 d) Supplement 

It would be easily conceivable from the disclosure of this specification for those skilled in the art to selectively com- 
bine the above-described LSP-based embodiment, PARCOR-based embodiment, and LAR-based embodiment. It 
could also be easily conceived from the disclosure of this specification for those skilled in tiie art to combine each 
embodiment of the present invention with the conventional LPC-based apparatus. These various combinations contrib- 
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ute to the implementation of a filter 203 having a high degree of freedom of characteristic design, which could not be 
otherwise implemented. For example, as shown in Fig. 27. the filter coefficient oli of the filter 204 may be defined by 
the same method as the reference 1 whereas the fOter coeffident 02, of the filter 205 may be defined by the same 
memod as the PARCOR-based embodiment. This configuration would lead to a filter 203 presenting a lower spectral 
gradient than the characteristics D of Rg. 33 and less distortions In the vidnity of formants than the characteristics D of 

rig. 34. 

In front of or behind the filter 203 or in parallel with the filter 203, there may be disposed another filter to perform 
prtch enhancement processing, high-frequency enhancement processing, fbrmanl enhancement processing, etc. 

Claims 

1. A filter comprising: 

filtering means for filtering synthesized speech signals through a transfer function defined by filter coefficients 
to generate modified synthesized speech signals; and 

filter coefficient generation means for generating said filter coefficients on the basis of spectral information rep- 
resented in the form of a multi-dimensional vector and belonging to a predetermined dbmain and pertaining to 
input speech signals, in such a manner that formant characteristics of said modified synthesized speech sig- 
nals are enhanced in accordance with said specfral Infonnation and in conparison with those of said synthe- 
sized speech signals; 

said spectral information being any one of LSP Information, PARCOR information and LAR information. 

2. A filter according to claim 1 , wherein 

said filter coefficients belong to an LPC domain. 

3. A filter according to claim 2. wherein 

said filter coefficient generation means includes: 

modification means for modifying said spectral information within said predetermined domain to generate mod- 
rfied spectral information; and 

means for transfbmiing said modified spectral information from said predetermined domain Into an LPC 
domain to generate said filter coefficients. 

4. A filter according to claim 3, wherein 

said modification means Includes flattening means for modifying said spectral information so as to reduce 
peaKs of formants of said modified synthesized speech signals. 

5. A filter according to claim 4, wherein 

said spectral information is LSP information, and wherein 
^u- !f ^'t"®"'"^ \nc\ixies proportional division means for proportionally dividing, in accordance with a 
modified coefficient, said spectral information and reference information belonging to the very same domain to 
Which said spectral information belongs to generate said modified spectral information. 

6- A filter according to claim 5, wherein 

said proportional division means proportionally divides said spectml Information and said reference Informa- 
tion so as to impart a fixed spectral gradient to said modified synthesized speech signals. 

7. A filter according to claim 5. wherein 

said proportional division means proportionally divides said spectral information and said reference informa- 
tion so as to impart to said modified synthesized speech signals a spectrum gradient reflecting an average noise 
spectrum . ^ 

8. A filter according to claim 5. wherein 

said proportional division means proportionally divides said spectral information and said reference informa- 
tion so as to impart to said modified synthesized speech signal a spectrum gradient reflecting a history which said 
spectral information has traced so far. 

9. A filter according to claim 4. wherein 

said specfral information Is either PARCOR information or LAR information, and wherein 
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said flattening means includes means for multiplying, for each of a plurality of dimensions constituting said 
spectral information, said spectral Information by a modified coefficient or by the power of said modified coefficient 
to generate said modified spectral Infbrmatlon. 

5 1 0. A filter according to daim 9, wherein 

said power is dependent on said dimension. 

1 1 . A filter according to daim 3, wherein 

said spectral information is LSP information, and wherein 
10 said modification means includes distance expansion means for expanding distances t>etween adjacent 

dimensions among a plurality of dimensions representative of said spectral information to generate said modified 
spectral information. 

12. A filter according to daim 1 1 . wherein 

15 said distance expansion means includes: 

expansion means for expanding said distances beyond said reference distance, when said distances between 
adjacent dimensions are less than a reference distance; 

compression means for equally compressing said distances with respect to all said adjacent dimensions, after 
20 the expansion of said distances between adjacent dimensions by said expansion means, so as to ensure that 

the extent of said spectral information in its entirety becomes coincident with the extent before expansion. 

13. A filter according to daim 3. wherein 

said spectral information is LSP information, and wherein said modification means includes: 

25 

proportional division means for proportionally dividing, in accordance with a modified coeffident, said spectral 
information and reference information belonging to the very same domain to which said spectral information 
belongs; 

distance expansion means for expanding distances between adjacent dimensions among a plurality of dimen- 
30 sions representative of said spectral information; and 

switching means for selectively using either said proportional division means or said distance expansion 
means to generate said modified spectral infbrnrtation. 

14. A filter according to daim 3. wherein 

35 said spectral information is LSP information, and wherein 

said modification means includes: 

proportional division means for proportionally dividing said spectral infbrmatlon and reference information 
belonging to the very same domain to which said spectral information belongs in accordance with a modified 
40 coeffident; 

distance expansion means for expanding distances between adjacent dimensions among a plurality of dimen- 
sions representative of said spectral information; and 

cascade connection means for using both said proportional division means and said distance expansion 
means in cooperation to generate said modified spectral information. 

45 

15. A filter according to daim 3, wherein 

said modification means indudes a translation table for storing said spectral information in correlation with 
said modified spectral information, said translation table generating said modified spectral information to be gener- 
ated in response to the supply of said spectral information. 

50 

16. A filter according to daim 3, wherein 

said modification means indudes a neural network which has acquired, by learning, an ability to transform 
said spectral information into said modified spectral information, said neural network generating modified spectral 
information to be generated in response to tiie supply of said spectral information. 

55 

17. A filter according to daim 3, wherein 

said modification means includes: 
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a plurafity of category specific modification means each provided for each of a plurality of categories which do 
not overlap one another and which are obtained by classifying said predetermined domain; 
said plurality of category specific means each includes: 

means for modifying said spectral information within a con-esponding category to generate modified spectral 
information; and 

means for transforming said modified spectral information from said predetermined domain into LPC domain 
to generate a filter coefficient. 

18. A filter according to claim 3. wherein 

said modification means includes: 

means for modifying, in accordance with a modified coefficient, said spectral information within said predeter- 
mined domain to generate modified spectrum information; 

means for transforming said modified spectrum information from said predetermined domain into an LPC 
domain to generate said filter coefficients; and 

means for adjusting saW modrfied coefficient in accordance with which category said spectral information 
belongs to among said plurality of categories, which are obtained by dividing said predetermined domain and 
which do not overlap one another. 

1 9. A filter according to claim 1 . wherein 

said filter coeffiderts belong to any one of an LSP domain and a PARCOR domain. 

20. A filter according to claim 19. wherein 

said filter coefficient generation means includes: 

modification means for modifying said spectral information within said predetermined domain to generate mod- 
ified spectral information; and ^awiiMU 
means for supplying said modified spectral information as said filter coefficients into said filtering means; 

21 . A filter according to claim 1 . wherein 

said f iHenng means includes a synthesis filter for implementing the denominator of said transfer function so 
as to that formanl characteristics of said modified synthesized speech signals are enhanced compared with 
those of said synthesized speech signals. 

22. A fater according to claim 21 , wherein 

r^ii-^*^ Altering means further includes an inverse filter for suppressing a spectral gradient imparted to said 
modified synthesized speech signals by said synthesis filter. 

23. A speech synthesizing apparatus comprising: 

means for generating synthesized speech signals on the basis of spectral information represented In the form 
of a mufti-dimensional vector and belonging to a predetermined domain and pertaining to Input speech signals- 
means for filtering synthesized speech signals through a transfer function defined by filter coefficients to gen- 
erate modified synthesized speech signals; and 

means for generating said filter coefficients on the basis of said spectral information in such a manner that 
IT. said modified synthesized speech signals are enhanced in accordance witii said 

spectral information and in conparison with those of said synthesized speech signals- 
said spectral Information being any one of LSP informatfon. PARCOR information and LAR information. 

24. A speech synthesizing apparatus corpprlsing: 

means for generating a synthesized speech signal on the basis of first spectral information represented in the 
Snals-^ '""'*'-«^«"ensional vector and befonging to a predetermined domain and pertaining to input speech 

means for transforming said first spectral information into second spectral information belonging to a different 
domain from said predetermined domain; 

means for filtering synthesized speech signals through a transfer function defined by filter coefficients to gen- 
erate modified synthesized speech signals; and 
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means for generating said filter coefficients on the basis of said second spectral Information so as to ensure 
that formant characteristics of said modified synthesized speech signals are enhanced in accordance with said 
second spectral information and in comparison with those of said synthesized speech signals; 
said spectral information being any one of LSP Information, PARCX)R information and LAR information. 

25. A speech synthesizing apparatus comprising: 

means for generating synthesized speech signals on tiie basis of first spectral information represented in the 
form of a multi-dimensional vector and belonging to a predetermined domain and pertaining to input speech 
signals: 

means for analyzing said synthesized speech signals to generate second spectral information; 
means for filtering synthesized speech signals through a transfer function defined by fnter coefficients to gen- 
erate modified synthesized speech signals; and 

means for generating said filter coefficients on the basis of said second spectral information so as to ensure 
that fbrmants characteristics of said modified synthesized speech signals are enhanced in accordance with 
said second spectral information and in conrparison with those of said synthesized speech signals; 
said spectral information being any one of LSP information, PARCOR information and LAR information. 

26. A speech storage/transmission system comprising: 

means for analyzing input speech signals to generate spectral infomiation represented in tiie form of a multi- 
dimensional vector and belonging to a predetermined domain and pertaining to said input speech signals; 
means for storing or transmitting said spectral information; 

means for generating syntiiesized speech signals on tiie basis of said spectral information which has been 
stored or transmitted; 

means for filtering said synthesized speech signals tiirough a transfer function defined by filter coefficients to 
generate modified syntiiesized speech signals; and 

means for generating said filter coefficients on the basis of said spectral information so as to ensure that form- 
ant characteristics of said modified syntiiesized speech signals are enhanced in accordance witii said spectral 
information and in comparison with those of said synthesized speech signals; 
said spectral information being any one of LSP information. PARCOR information and LAR information. 

27. A speech storage/transmission system comprising: 

means for analyzing input speech signals to generate first spectral information represented in the form of a 
multi-dimensional vector and belonging to a predetermined domain and pertaining to said input speech sig- 
nals; 

means for storing or transmitting said first spectral information; 

means for generating a synthesized speech signal on tiie bas'rs of said first spectral information which has 
been stored or transmitted; 

means for transforming said first spectral information into second spectiBi information belonging to a different 
domain from said predetermined domain; 

means for filtering said synthesized speech signals tiirough a transfer function defined by filter coefficients to 
generate modified synthesized speech signals; and 

means for generating said filter coefficients on tiie basis of said second spectral information so as to ensure 
tiiat formant characteristics of said modified synthesized speech signals are enhanced in accordance with said 
second spectral information and in comparison with those of said synthesized speech signals; 
said spectral information being any one of LSP information, PARCOR information and LAR information. 

28. A speech storageAransmission system comprising: 

means for analyzing input speech signals to generate first spectral information represented in the form of a 
multi-dimensional vector and belonging to a predetermined domain and pertaining to said input speech sig- 
nals: 

means for storing or transmitting said first spectral information: 

means for generating syntiiesized speech signals on the basis of said first specti'al information which has been 
stored or transmitted; 

means for analyzing said synthesized speech signals to generate second spectral information; 
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means for filtering said synthesized speech signals through a transfer function defined by filter coefficients to 
generate modified synthesized speech signals; and 

means for generating said filter coefficients on the basis of said second spectral Information so as to ensure 
that fbrmant characteristics of said modified synthesized speech signal are enhanced in accordance with said 
second spectral information and in comparison with those of said synthesized speech signals- 
said spectral information being any one of LSP information. PARCOR information and LAR infonnation. 

29. A speech modification method comprising: 

first step of filtering synthesized speech signals through a translation function defined by fyter coefficients to 
generate modified synthesized speech signals; and 

second step of generating said filter coefficients on the basis of spectral information represented by a multi- 
dimensional vector and belonging to a predetermined domain and pertaining to said synthesized speech sig- 
nals, so as to ensure that Ibrmant characteristics of said modified synthesized speech signals are enhanced in 
accordance with said spectral information and in comparison wfth those of said synthesized speech signals- 
said second step preceding the execution of said first step; 

said spectral information being any one of LSP information. PARCOR information and LAR information 
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